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ABSTRACT 



A discussion of the nature of speech is presented, followed by a 
review of speech processing to date, with emphasis on the characteris- 
tics of speech which must be retained for intelligibility. Methods of 
measuring speech intelligibility are described. The relative merits of 
abrupt and gradual audio clipping of speech are investigated, and two 
tone and articulation test results are presented showing that there is 
no significant difference in these methods of clipping with respect to 
speech intelligibility. Processing of speech to radio frequencies, 
filtering and retranslation to audio to improve the peak to average 
value ratio of the audio frequency prior to transmitting it through a 
noisy channel is investigated. Two tone and articulation test results 
are presented showing^that this processing results in a 20% improvement 
in speech intelligibility over audio clipping and filtering. 
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1. Introduction. 



In spite of all his attempts to sophisticate his systems of communi- 
cations, man has yet to device a more effective means than ordinary 
speech. While the redundancy and lack of logic of some aspects of speech 
is obvious, there is no other means available to us that so effectively 
performs the mission of a comm^icat ions system, which is to transfer 
thoughts or ideas from one human brain to another. No other method of 
communication can so precisely indicate the exact meanings that the 
individual ’’transmitting” desires the individual ’’receiving” to under- 
stand. Speech is limited, of course, by language, vocabulary, and so 
on. 

When it is desired, however, to transmit thoughts, or to communi- 
cate, over a distance of more than a few feet, we discover that speech 
has further limitations or drawbacks. When we attempt to use speech in 
an electronics communications system that is peak-power- 1 imit ed , and to 
transmit this speech in a noisy environment, we find that these draw- 
backs can be serious impediments to effective communications. Hence, 
for nearly forty years (25) men have been studying ways in which to 
process speech to aid in achieving better communications. The main idea 
has been to process speech in certain ways to remove its disadvantages 
as a comm^icat ions means, while retaining as much of its ability to 
convey meaning to the listeners as possible. The measure of the success 
of a speech processing system has been the degree by which intelligibil- 
ity is improved, for a given set of conditions, over unprocessed speech. 
Generally there has not been too much concern, through the years, over 
obtaining high quality speech reproduction for communications purposes, 
but only over obtaining high intelligibility. 
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In the succeeding sections there will be given a brief description 
of the nature of speech and a review of what types of things have been 
done in speech processing to date, and with what results. Then there 
will be a short discussion of methods of determining speech intelligi- 
bility, followed by a description of, and comments on the value of two 
new ideas in speech processing. These ideas consist of the following: 
First, it might be possible to reduce the distortion introduced by audio 
speech clipping, which, as we will see, is a common method of speech 
processing, by choosing a clipper with a gradual input-output character- 
istic, rather than the normal one wherein <ilipping occurs abruptly at 
some particular level*. Second, it should be possible to improve the 
intelligibility of an audio signal by translating it to radio frequen- 
cies, (that is generate a single-sideband wave) then clip it, filter and 
translate it back to the audio range again. The results of intelligibil- 
ity tests on these systems will be presented and discussed in the hope 
of providing further understanding of speech and speech processing. 

2- The Nature of Speech. 

Speech can be compared to a modulated carrier signal (5) , the nature 
of which varies quite a bit with time. For the vowels oi? voiced sounds, 
the carrier consists of tones generated by the vocal cords, while for the 
consonants or unvoiced sounds the carrier is like broadband noise (18). 
The modulation consists of: 

(a) Turning on and off the carrier. 

(b) Frequency modulation by emphasis, inflection and so on. 

(c) Modification of the harmonic content of the carrier. 

(d) Amplitude modulation. 

As with any other waveform, speech may be represented in the 
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frequency domain or the time domain. In the frequency domain we see 
that for vowels, intensities are concentrated in one or more distinct 
frequency regions, called formant regions. Each vowel sound has its own 
set of characteristic formant regions, although these are not necessarily 
the same when the sound is uttered by different people. The consonants 
have components in the frequency domain that generally lie higher than 
those of the voWels and are of lower intensity. Here the intensities tend 
to be scattered continuously over the spectrum, hence the noise-like 
description for the carrier of a consonant as given above (10). This 
distribution of the intensities in consonants is caused by the fact that 
they are not produced by the vibration of the vocal cords, as are vowels. 

The average intensity spectrum of speech is shown in fig. 1 (10). 

Here we see a sharp drop after about 600 Hz. The formant regions are 
typically below 3000 Hz. for adult speech and for vowels three are usually 
found (21). Figure 2 shows the formant regions for the ee sound in **pro- 
ceedings** where a fourth formant at 4000 Hz is present (21). 




Fig. 1 Intensity distribution Fig. 2 Spectrum of ee sound 
of average speech in ’’proceedings’*. 



The formant regions occur at harmonics of the fundamental frequency 
of the voice which ranges from about 90 Hz. for a deep-voiced man to 300 
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Hz. for a high-voiced woman (8). 

As we will see in our discussion of speech processing, a great deal 
can be done to speech that will still yield intelligibility. For some 
time the search has been on to discover what elements in speech remain 
invariant under these sometimes radical alterations that still result in 
intelligibility. This search has narrowed dox^ to the frequency spectrum. 
Agreement has more or less been reached that if the formant regions are 
not severly altered the intelligibility of the speech will not suffer un- 
duly. The most striking example of this is the formant vocoder. This 
device locates and measures the energy in the formant regions. This 
information can be coded, transmitted, and intelligible speech reproduced 
at the receiver (21). In 1959 here at the U.S. Naval Postgraduate School, 
S.R. Wilde devised a scheme for speech synthesis using the formant re- 
gions that resulted in intelligible speech using only 140 Hz. of band- 
width. 

In these vocoders we 4see that the only information used in the 
original wave is that contained in the power spectrum. It has been shown 
that the information contained in the spectrum, the autocorrelation 
function, and the average number of zero crossings of the time domain 
waveform are all three equivalent, and that the formant movements can be 
approximated by the running averages of the number of zero crossings of 
the original and differentiated waves (2). 

3. Speech Processing, General. 

The subject of speech processing is generally concerned with answer- 
ing the following question: What characteristics of speech are undesirable, 
and what can be done to eliminate them, while not altering the power 
spectrum of the wave a great deal? In a peak power limited system we are 



10 



interested in a signal with a low peak to average value ratiou With 
such a signal we can achieve the best average signal to average noise 
ratio when we attempt to transmit our signal through a noisy environ- 
ment. The normal peak to average ratio of speech, however, is 14.5 db 
(18). This is an undesirable feature of speech which we would like to 
eliminate. Also, as we have seen, speech covers a bandwidth of around 
5000 Hz. Obviously, it would be nice to reduce this if possible. The 
following two sections will discuss the efforts that have been put forth 
to accomplish these two objectives while still retaining intelligibility. 
4. Audio Speech Processing. 

The first step in the effort to reduce the peak to average ratio of 
speech was to clip the peaks of the speech wave. In 1946 J.R. Licklider 
found that for such a system as we have described maximum intelligibility 
is achieved by clipping the peaks of the speech wave and using the avail- 
able power for the rest of the wave. He also attempted center clipping 
wherein the center portions of the wave is removed and only the peaks are 
passed. Thfts , however, resulted in very poor intelligibility beyond a 
few db of clipping (12). 

The big difference in these two types of clipping is that peak clip- 
ping does not alter the zero crossing characteristics of the time wave 
form while center clipping does. This can be seen in fig. 3. Thus, as 
we have seen, center clipping alters one of the invariants and we would 
expect intelligibility to suffer. Licklider also performed various de- 
grees of linear rectification on speech signals and found that articula- 
tion began to suffer just as half-wave rectification was reached or just 
at the point where the zero crossings began to be altered. Figures 4 
and 5 show the results Licklider obtained using articulation tests as 
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(A) PEAK CLIPPING (B)CENTER CLIP (C) LINEAR RpCT 




Fig. 3. Characteristics of (A) Peak Clipper 

(B) Center Clipper (C) Linear Rectifier 




Fig. 4. Effects of peak and 

center clipping on speech 
in noise. 




Fig. 5. Effect of linear 
rectification on 
speech in noise. 

9 shown in Fig. 3(C) 
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the measure of intelligibility. 

In 1948, Licklider, together with I. Pollack, applied himself to a 
further study of the effects of various types of processing on speech 
intelligibility (13). They investigated the effects of integrating, 
differentiating, and clipping of the wave form on speech intelligibility 
without noise. Figure 6 illustrates the effects of various combinations 
of these steps on a sine wave and a speech wave, as far as appearance in 
the time domain is concerned. This study discovered the following: 

(a) Differentiation and integration alone do not effect intelligi- 
bility to a significant degree. 

(b) Infinite (very hard) clipping alone causes a decrease of 
intelligibility of about ten percent below (a). 

(c) Infinite clipping preceeded by differentiation caused no signi- 
ficant decrease in intelligibility. 

(d) Infinite clipping preceeded by differentiation followed by 
integration yielded the same results as (c). 

(e) Infinite clipping followed by differentiation had no effect on 
intelligibility other than that caused by the clipping alone, but the 
quality of the resulting speech was worse. 

(f) Infinite clipping followed by integration caused no further 
degradation of intelligibility over clipping alone, but the quality of 
the speech was improved. 

(g) Infinite clipping preceeded by integration resulted in very poor 
intelligibility, with scores 70% below those of (^) . 

(h) Infinite clipping preceeded by integration followed by differ- 
entiation resulted in even poorer scores, 80% below those of (a). 

The integrator and differentiator used in these tests are shown in 
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Figure 6. Schematic Illustration of the effects of the 
distortions upon sine waves and upon speech 
waves . 
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figures 7 and 8. Differentiation serves to ’’tilt’* the spectrum up. It 
introduces six db less attenuation for each octave increase in frequency. 
Integration has the opposite effect, tending to tilt the spectrum down- 
ward six db per octave. Looking again at fig. 1, we see that the inten- 
sity of the high frequencies in speech is much less than that of the low 
frequencies in natural speech. When we differentiate then clip we are 
emphasizing the highs before clipping. Thus in the clipped wave, the 
highs, which carry much of the intelligibility, are less likely to be 
masked by noise. We are of course changing the quality of the speech 
in doing this. When we integrate before clipping, we do the opposite and 
the highs can be completely lost. Since clipping alone tends to bring 
the lows down closer to the highs in intensity, clipping followed by 
integration will result in more natural sounding speech. On the other 
hand clipping followed by differentiation will result in worse speech 
quality since the normal ratios of intensities is further changed. 

Thus we can say that ||ifinite clipping proceeded by differentiation 
can be used to reduce speech to a bivariate code and integration can be 
used to retrieve natural speech. However it has been found that differ- 
entiation before clipping raises the peak to average ratio of the wave by 
4 db to 18.5 db (18). Thus we would have to clip harder and amplify more 
after clipping. Since we are interested mainly in intelligibility, it is 
doubtful whether this differentiation is worth it. We can see that in- 
tegration before clipping is just the opposite of what we want to do with 
a speech wave. 

So far we have discussed clipping only with reference to a fixed 
signal to noise ratio, or with reference to no noise at all. Pollack 
discovered that infinite peak clipping improved intelligibility for a 
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given signal to noise ratio until high signal to noise ratios were 
reached (19). This decrease in the benefits of clipping is expected 
since, as we have seen, infinite clipping does reduce intelligibility 
by about i^±en percent with no noise present. This can be explained by 
considering the distortion introduced by clipping as noise. Then, be- 
yond a certain level of actual noise, the noise introduced by clipping 
will outweigh the benefits gained by clipping (18). In later studies (20) 
Pollack investigated the effect of clipping on speech further and found 
that clipping was definitely beneficial at poor signal to noise ratios. 

For a five db signal to noise ratio he determined that when the peak of 
the speech wave was clipped 24 db, in order to achieve the same intelligi- 
bility the gain had to be increased to 13 db, resulting in an improvement 
of 11 db. 

As has been pointed out previously, it would also be nice if the 
bandwidth of speech could be reduced. Invest igat: ions have been carried 
out to determine the effects of limiting the frequencies of the speech 
wave form. Among these were those carried out by Egan and Wiener at the 
Harvard Psycho-Acoustical Laboratories. These results show that intelli- 
gibility scores vary only about eight percent below the full bandwidth 
case when the speech frequencies are limited to 340 and 3900 Hz. As long 
as the pass band for speech is in this range intelligibility does not 
suffer. The important thing is that most of the formant regions must be 
included in the pass band (7). Figure 9 shows the effect of filtering 
on the intelligibility of speech. 

It has been determined that if speech is limited to a given band of 
frequencies, the intelligibility of a clipped relative to an undipped 
signal is a function of the signal to noise ratio alone (19). We have 
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Fig. 9. Intelligibility of band-limited speech, 
seen how clipping alone introduces a decrease in intelligibility at high 
signal to noise ratios. This effect is shown to decrease if the lower 
frequencies of the speech are removed prior to clipping (19). The lowfir 
frequencies contain nearly all voice fundamentals. The formant regions, 
however, are at harmonics of the voice fundamentals. The clipping pro- 
cess, as we will see in section 8, introduces harmonics of the frequencies 
contained in the original wave. Thus if the lower frequencies are present 
when a speech wave is clipped, the harmonics generated by the clipping 
process lie right where the formant regions should be and thus alter them. 
Also, as we shall see in section 8, the clipping process introduces inter- 
modulation products among the frequencies present in the original wave. 
These products will also lie in or near the formant regions if the low 
frequencies are present in the undipped wave. Thus we can see that the 
”noise” generated by clipping can be reduced by removing frequencies be- 
low the highest expected voice fundamental, about 300 Hg. We cannot com- 
pletely eliminate frequency distortion caused by clipping. Harmonics 
and intermodulation products from all frequencies present in the wave to 
be clipped will appear as undesired frequency components in the clipped 
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wave. The thought occurs that perhaps some particular type of clipper 
can be found that will reduce these undesired components. Section 7 is 
devoted to a presentation of an idea along these lines and to showing 
intelligibility test results comparing two divergent types of clippers. 

5. R-F Speech Processing. 

So far in the discussion of speech processing we have only been con- 
sidering operations on the speech wave at audio frequencies. In communi- 
cations systems, however, we usually intend to translate our intelligence 
to radio frequencies before transmitting it through any appreciable noise 
Focusing our attention on radio frequency processing we see that the 
single sideband system of Modulation lends itself very well to a study of 
such processing. Here we have an opportunity to study the effects of 
clipping at three places in the system; at the audio frequencies, at r-f, 
but with the double sideband signal, and at r-f with the single sideband 
signal. In fact, an extensive study at the Montana State College in 1962 
did just that (27). In this project clipping of various degrees was per- 
formed at each point in a single sideband system; at audio, double side- 
band, and single sideband, with appropriate posted ipping filtering to 
regain bandwidth. The processed signals were mixed with varying degrees 
of noise and signal intelligibility of the wave after detection was 
measured. In addition, combinations of clipping at all three places 
were tested, as well as various methods of achieving high clipping levels 
such as clipping one-half the desired amount, filtering, and then clip- 
ping the other half. 

The results of this study show that single sideband clipping yields 
significantly higher intelligibility scores than do audio or double side- 
band clipping, or any combination of the three. When clipping at single 
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sideband is done, the frequencies being clipped are the same ones as in 
the original audio wave, but after they have been translated to radio 
frequencies. Now the formant regions, for instance, no longer bear har- 
monic relationships to each other. When we clip at r-f, the harmonics 
and intermodulation products are ’’splattered” over a much wider frequency 
range, so it is possible to filter out all but those occurring immediate- 
ly around the carrier frequency. Thus, when the wave is demondulated we 
have many fewer undesired components present. 

In double sideband clipping we have twice as many frequencies pre- 
sent in the wave to be clipped and so end up with many more undesired 
components too close to the carrier to filter out without removing our 
intelligence. 

In single sideband clipping we do have a repeaking problem as a re- 
sult of the filtering. In the Montana study this was observed to reach 
four db for very hard clipping. However, this is still a considerable 
saving over the original 17.5 db peak to average ratio of undipped single 
sideband speech (18). 

If a speech wave is infinitely clipped at the audio level and is 
used to modulate a single sideband wave with an r-f pass band of f + 300 
to f + 3000, the peak to average ratio of the resulting single sideband 
signal is about 7.3 db (18). Thus, not only do we have more distortion 
present with audio clipping, but we do not achieve as low a peak to 
average power ratio as with single sideband clipping. 

The effects are so well recognized now that the Collins Radio Com- 
pany, in their single sideband manual categorically state that speech 
clipping at audio frequencies ”is of no practical value in a single 
sideband transmitter” (1). 
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Returning to the Montana study for a moment, this group points out 
that iterative clipping, that is clip, filter, clip again, has no advan- 
tage over single sideband clipping in one stage followed by filtering to 
regain band width. In addition various combinations of differentiation, 
integration and clipping were investigated with no significant result (27). 

The Voice of America radio has used single sideband clipping to 
achieve a 9 db improvement in signal to noise ratio in combating jamming 
(11). Single sideband clipping has been applied to amateur radio also 
with excellent results (24). 

The above discussion of speech processing at radio frequencies was 
with reference to a system wherein the noise is introduced at the radio 
frequencies. That is a system which is concerned with transmitting a 
radio frequency wave through a noisy channel. But consider a peak power 
limited system where the noise is introduced at the audio frequencies, 
such as a public address system or the ”one MG’* and **21 MIJ** systems a- 
board U.S. Navy ships. Wa have seen that it would be advantageous to 
perform clipping on the audio wave to improve intelligibility. But might 
it not be feasible to introduce a device i^to the system in which the 
signal is translated to a radio frequency, clipped, filtered, then trans- 
lated back down to the audio frequencies? Should not this process result 
in even greater intelligibility due to the removal of distortion caused 
by clipping in the filtering of the clipped wave? This idea will be dis- 
cussed and investigated in section 9. 

6. Intelligibility Measure: The Articulation Test. 

We have seen how various types of speech processing used in the past 
effect speech intelligibility, and we have mentioned two additional ideas 
that we will discuss further on. But no discussion has been made about 
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how speech intelligibility is measured. 

The most commonly accepted method of testing the intelligibility 
of a speech processing system is the articulation test. First developed 
by the Bell Telephone Co., (9) these consist of trained listeners listen- 
ing to a selected list of sounds, words, or sentences and recording what 
they hear. The results are compared with the lists actually transmitted 
through the system under test and a mean articulation score is computed. 
This is compared against known scores achieved using other systems to 
determine the relative merits of the system under test with respect to 
intelligible transmission or reproduction of speech. 

There are many ways to conduct articulation tests. The test re- 
sults shown in the next two sections were obtained using the methods 
described by the Harvard Pyscho-Acoust ical Laboratory study, *^Articula- 
tion Testing ^fethods II'* (16). In these tests phonetically balanced 
word lists were used. These are lists in which speech sounds occur with 
ax>|^oximately the same frequency as they occur in the English language, 
and the words are so chosen that there are no very easy or very difficult 
words in each list. That is, all the words are of uniform, intermediate 
difficulty. This eliminates "dead wood" words which would always be missed 
or always be heard correctly and thus give no information on intelligibil- 
ity. 

Word lists rather than sentence lists or sound lists were used for 
the following reasons: Sound lists require a very careful "talker" and 

very well trained listeners. Neither were readily available. Sentence 
lists are easier than word or sound lists in this respect, but the time 
needed to give and grade tests composed of sentence lists was considered 
excessive. Twenty phonetically balanced word lists were used. The order 
of the words on each list was randomized with the aid of a table of ran- 
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dom numbers and the order of some of these lists were reversed to give 
additional lists. Care was taken to ensure that the listeners did not 
hear a list and its inverse version within too short a time, and when it 
was necessary to use a list for the second or third time care was also 
taken to make sure a sufficient amount of time had elapsed so that the 
listeners were not able to recognize the order of the words. A total of 
32 lists were generated. Samples of these are given in Appendix II. The 
Harvard study contains all twenty of the original lists, with the words 
in alphabetical order. 

For each word list the peak list word was determined. This is the 
word which resulted in the highest amplitude for each list. This word 
was used to determine the peak signal for each list in order to set the 
clipping level C and the signal to noise ratio A. , defined below. A list 
of the peak list words and their relative amplitudes is contained in 
Appendix II. 

These word lists were initially recorded with a signal to noise 
ratio of 45 db on a Berlant Concertone tape recorder. The microphone 
used was an Altec 660A dynamic. A peak reading meter on the recorder and 
a Tektronix 515A oscilloscope was used to keep the recording voice at a 
constant level. 

Both series of tests described in sections 8 and 9 involve clipping 
and signal to noise ratios. Since we are concerned with random noise and 
peak power limited systems these parameters were defined as: 

^ = signal to noise ratio = 201og^QE^/E^ 

C = clipping level = 201og^QEg/E^ 

where 

E^ = peak signal at point where noise is introduced 
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Eg = peak signal after clipping 
= r.m.s. noise voltage 

The noise voltage was generated by a General Radio Company type 
1390-B random noise generator. E^ was measured by a calibrated meter on 
the face of the generator which was connected directly across its output, 
and E_ were measured with an oscilloscope, using the peak list 

words . 

Each test consisted of two of the phonetically balanced words lists 
of fifty words each. Each word was given as the last word of a carrier 
sentence. The carrier sentence used was **The word you should write is 
. ” Only the word under test, always the last word in the sen- 
tence, was recorded by the listener. The carrier sentence was used for 
two reasons (16). First, the listener is prepared for the test word and 
the missing of words due to inattention is reduced. Second, the carrier 
sentence helps to keep the voice level even while recording the lists. 

A space of three to four seconds between carrier sentences was found to 
be adequate. As recommended in the Harvard study (31), six listeners 
were used. In the tests described in section eight these were U.S. Navy 
enlisted men, all of about 22 years of age. The minimum educational back- 
ground of this group was three years of college training. Unfortunately 
this group was not available for the test described in section nine. In 
these tests 5 listeners were used, three of whom were U.S. military offi- 
cers and college graduates, one of whom was a U.S. Navy enlisted man with 
some college training and one of whom was a U.S. Navy enlisted man with a 
high school education. No significant differences in the scores of these 
listeners were noted. 

To avoid fatigue the testing procedures were as follows: The tests 
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vjere grouped into sessions of three tests each, each test being of about 
14 minutes duration. Between each test the listeners were given about 
one %inute to adjust headsets, chairs and so on. After each session, 
which lasted around 45 minutes, a 15 minute break was given. No more 
than three consecutive sessions were held before stopping for lunch or 
quitting for the day. 

The listening facility was in a small quiet room. Each listening 
position was numbered and consisted of a chair, a writing space, a volume 
control and a headset. The headsets were standard 300 ohm communications 
headsets used by the Navy. To each was added foam earpads to add comfort 
and to help shield noise. 

The listeners recorded what they heard on forms like that shown in 
Appendix II. In order to ensure that the positions did not effect the 
scores, the average rank of the scores made at each position was calcu- 
lated. Similarly to check for significant differences in the listeners, 
the average rank of each listener’s scores was also determined. These 
two figures were made independent by having the listeners shift positions 
after each test, thus ensuring that no listener stayed at one position 
too long. These results, shoi^ in Appendix II, were such that there was 
no substantial difference in listeners or positions. 

All listeners scores are given in Appendix II for each series of 
tests. Further details on each series of tests may be found in section 
eight or nine and in Appendix II. 

7. Other Intelligibility Measures. 

While the articulation test is the most widely accepted method of 
determining intelligibility, as well as the most obvious, work has been 
done on other methods as well. These methods are based generally on the 
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idea that intelligibility is a function of how well the running power 
spectrum of the wave is preserved by the system under test. In one case 
(22) , equipment was built and tested which compared the running power 
spectrum of the speech before and after processing and calculated an in-- 
telligibility index. This index seemed to compare favorably with articu- 
lation test scores. In another case (26), devices were designed to 
measure the average number of zero crossings of the speech wave. From 
this information an index of intelligibility was calculated. 

Neither of these two methods seems to have found general acceptance. 
Hence for this project the more conventional articulation test was used. 
8. Gradual and Abrupt Clipping. 

As we have seen, speech clipping at audio frequencies can be used 
as a means to increase the peak to average ratio of speech waveforms in 
peak power limited systems. Wfe have seen how such clipping can be very 
beneficial in systems where intelligibility in the presence of noise is 
of paramount importance, while the quality of the speech heard by the 
listener is of secondary importance. 

Usually one thinks of a clipper as a device having the characteris- 
tics shown in figure 10. Here the output e^ is a faithful reproduction 
of the input e^ up to the point where 0^=0. After this point e^ = C 
no matter how large e| becomes. This will be referred to as an abrupt 
clipper, where C is the clipping level. 

One can, however, perform clipping with a device with a characteris- 
tic such as that shown in fig. 11. Here clipping begins almost as soon 
as e£ becomes greater than zero and e^ reaches some ’‘saturation** point 
C, beyond which it remains constant no matter how big e^ becomes. This 
will be called a gradual clipper. 
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Figure 10. Abrupt Clipper 



Figure 11. Gradual Clipper 



It is the purpose of this section to investigate the relative merits 
of the abrupt and gradual clipper as applied to speech. The criteria 
used will be the intelligibility of the clipped wave in the presence of 
various degrees of noise with various degrees of clipping. 

This investigation was prompted by a remark in an article by Middle- 
ton to the effect that gradual clipping has less effect on the spectrum 
of Gaussian noise than abrupt clipping (15). Davenport has determined 
experimentally that the probability distribution for the noise-like un- 
voiced sounds is approximately Gaussian (4), so it would seem that grad- 
ual clipping would have some advantage over abrupt clipping. 

First it was decided to determine the amount of intermodulation dis- 
tortion introduced by each type of clipper. In order to do this tests 
were made on a clipped two tone signal. Tones of 1500 and 2500 Hz. of 
equal amplitude were combined and clipped by each type of clipper at 
various clipping levels. The intermodulation components present in the 
clipped wave were then measured with a wave (spectrum) analyzer. 

The gradual clipper consisted of tVTO 1N34A germanium point contact 
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diodes, arranged back-to-back and unbiased. The clipping characteristics 
of this device is shown in £;ig. 12. For the abrupt clipper the same 
diodes were used, each reverse biased by one volt. The characteristic 
of this clipper is shown in fig. 13. 




Figure 12. Clipping characteristic, 1N34A, no bias 




Appendix I shows the equipment setup used in these tests together 
with a description of the instruments used. 

Since the clipper characteristics are odd functions, they can be 
approximated by an infinite series containing only odd terms, such as: 

I 1 3 , 5 

Co = ^l^t ^3^i + • • • 
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Considering only the first five terms of such a series we see that for an 
input of the form: 

e£ = AcosWjt + BcosW 2 t 

the output will contain the following frequencies (18): 

Wj, W2, 31 J, 3W2, 2 Wi 1 W2, Wi t 2W2, 5 Wi, 5W2, hWj t. W2, 

3 Wi i 2W2, 2 Wi ! 3W2, Wj 1 W2 . . . 

For the 1500 and 2500 Hz. tones used these frequencies are; 

1500, 2500, 4500, 7500, 5500, 500, 6500, 3500, 7500, 

12,500, 8500, 3500, 9500, 500, 10,500, 4500, 11,500, 

8500, . . . (all Hz.) 

Table I shows the relative amplitudes of these frequency components in 
db down from the fundamentals when the two tone signal was clipped with 
the indicated type of clipper. In addition the db difference between the 
two clippers (gradual minus abrupt) of each component is shown. We as- 
sume that we want to retain the two tones in the original signal and that 
everything else is clipping ”noise'* which we desire to minimize. 

It appears that from the standpoint of intermodulation distortion 
there is very little difference between the two types of clippers. 

Next it was desired to see if either clipper introduced a signifi- 
cantly larger harmonic content when clipping a single tone. A tone of 
200 Hz. was chosen to simulate a sound in the range of speech frequencies. 
Table II shows the results of clipping this tone with each type of clip- 
per. 

Here we see that the abrupt clipper does introduce slightly higher 
harmonic components, especially at the higher frequencies. The differ- 
ence between the two clippers is small until the higher harmonics are 
reached. These harmonics, however, are so small that they probably 
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Clipping 

level 

Freq. 


Gradual 

clipper 


3.8 db 
Abrupt 
Clipper 


Gradual 
- abrupt 


Gradual 

clipper 


6.8 db 
Abrupt 
clipper 


Gradual 
- abrupt 


500 


21.8 


19.0 


2.8 


17.2 


13.5 


3.7 


3500 


22.0 


20.8 


1.2 


17.0 


14.8 


2.2 


4500 


38.0 


28.0 


10.0 


28.0 


27.0 


1.0 


5500 


22.0 


21.2 


0.8 


18.0 


14.0 


4.0 


6500 


22.0 


21.1 


0.9 


17.0 


14.8 


2.2 


7500 


36.0 


28.9 


7.1 


39.5 


29.0 


10.5 


8500 


46.0 


54.2 


-8.2 


40. 3 


32.0 


8.3 


9500 


38.8 


42.2 


-3.4 


28.5 


30.0 


-1.5 






17.7 db 






4 




500 


13.1 


10.5 


2.6 








3500 


13.4 


10.0 


3.4 








4500 


20.5 


16.0 


4.5 








5500 


13.2 


15.0 


-1.8 








6500 


13.6 


10.0 


3.6 








7500 


31.2 


29.8 


1.4 








8500 


36.1 


35.4 


0.7 








9500 


20.5 


21.8 


-1. 3 














Table I 









Distortion components from two tone tests, in db down from 
fundamental . 
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Clipping 

level 



Freq . 

■200 

600 

1000 

1400 

1800 

2200 

2600 

3000 

3400 



200 

600 

1000 

1400 

1800 

2200 

2600 

3000 

3400 



Harmonic 
1st 
3rd 
5th 
7 th 
9th 
11th 
13th 
15th 
17th 

1st 

3rd 

5th 

7th 

9th 

11th 

13th 

15th 

17th 



Gradual 

clipper 

0 

18.0 

29.2 

40.2 

51.2 
72.0 

72.0 
* 

* 

0 

12.2 

18.0 
21.3 
24.8 

27.1 

29.1 

31.0 

34.0 



6.0 db 
Abrupt 
clipper 

0 

13.4 

27.0 

42.2 

36.2 

41.2 

56.2 
48.9 

50.0 

24.0 db 
0 

10.2 

15.0 
18.2 
20.8 
22.8 

24.8 
26.2 

27.9 



Gradual 
- abrupt 

0 

4.6 

2.2 

- 2.2 

15.0 

30.8 

15.8 



Gradual 

clipper 

0 

14.0 
21.5 
27.2 

32.1 
46.8 

41.0 

45.1 

49.2 



12.0 db 

Abrupt 

clipper 

0 

11.0 

18.4 

22.8 

28.2 

35.6 

38.6 

39.1 

44.2 



Gradual 
- abrupt 

0 

3.0 

3.1 

4.4 
3.9 

11.2 

2.4 
6.0 
5.0 



0 

2.0 

3.0 

3.1 

4.0 
4.3 
4.3 
4.8 

6.1 



*Too small to 
measure 



Table II 

Single tone clipping results, in db below fundamental 



30 



couldn't be detected by the ear. It remains to be seen whether these 



small differences in intermodulation distortion and harmonic distortion 
are sufficient to cause a difference in intelligibility, especially if 
the clipped signal is band limited. 

In order to determine whether either clipper results in increased 
intelligibility it was decided to conduct articulation tests as des- 
cribed in section six. The word lists were played into the clippers at 
at the levels necessary to obtain the desired clipping levels. The 
clipped signal was filtered with a pass band of 300 to 3000 Hz. Then 
noise from the noise generator filtered to the same bandwidth as the 



defined in section six. The clipping levels (C in section six) chosen 



were 0, 12 db, 24 db, and 33 db. The A *s were 3 db, 6 db, 12 db, and 
18 db. The resulting signal was recorded on tape and was played to the 
listeners later. 

Figures 14 (A), (B), (C), and (D) show the results of the articula- 
tion tests using the lN34A*s unbiased as the gradual clipper and the 
lN34A's with a 1 volt bias as the abrupt clipper. Without the benefit 
of statistical analysis, one could say that there is very little differ- 
ence between the two clippers. OneHmight be tempted to say that the 
abrupt clipper yields slightly higher intelligibility scores than the 
gradual one. Actually, however, only the sets of points 16 and 17 on 
fig. 14(B) and 22 and 10 on fig. 14(6) show a statistically significant 
difference. To determine this, a two-sided Mann-Whitney U test was used. 
This test is one of the most powerful that can be used on data of this 
nature (23). The null hypotheses, is that the samples of the two sets 
of scores being investigated came from the same population. A signifi- 



speech was introduced at a level 
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Figure 14. ARTICULATION TEST RESULTS 

Q= 1N34A zero bias, gradual clipper 

1N34A one volt bias, abrupt clipper 
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cance level, is chosen and the test determines the probability that 

the null hypthosis is true. If this probability exceeds the significance 
level , then the null hypothesis is accepted. A significance level of 
0.01 was chosen for this data. For a complete description of the Mann- 
Whitney U Test, with examples, and a discussion of significance levels, 
see Appendix III. 

With the data obtained as described above, and using the Mann-Whit- 
ney U Test with a 0.01 significance level, we see that of the twelve 
sets of data only two caused the null hypothesis to be rejected. Thus 
it can be concluded that there is no significant difference in the two 
sets of data, and the fact that the abrupt clipper appears better is just 
a result of chance. 

To confirm this further tests were run using a different gradual 
clipper composed of two unbiased 1N69A diodes whose clipper characteris- 
tic is shown in fig. 15. 




Figure 15. Clipping characteristic 1N69A zero bias 
The results of these articulation tests are shown compared with the 
abrupt clipper results in fig. 16 (A), (B) , (C) , and (D) . Using the 
Mann-Whitney test again with a significance level of 0.01, again only 
two sets of scores, marked 28 and 16 and 23 and 35 show significant dif- 
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Figure 16. ARTICULATION TEST RESULTS 

0= 1N69A zero bias, gradual. 

1N34A one volt bias, abrupt. 
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ferences. Note that the differences in these tests are in opposite direc- 
tions. The point labeled spurious in fig. 16(C) is considered too high 
and was not used. We see that the abrupt clipper no longer has higher 
scores. 

Thus it is safe to conclude that there is no significant difference 
between gradual and abrupt clippers as regards speech intelligibility,^^^' 

In the next section. another scheme for speech processing will be 
cons idered. 

9. Radio Frequency Clipping to Improve Audio Signal Intelligibility. 

As we have seen, it is quite well accepted practice to clip a single 
sideband speech wave in order to improve its peak to average value ratio 
while retaining intelligibility. This has application in systems in which 
it is necessary to transmit the radio frequency wave through a noisy chan- 
nel. In many applications it is desired to transmit speech at audio fre- 
quencies through noisy channels. As mentioned before, examples of peak 
power limited systems in which this is done are ordinary public address 
systems. 

We have also seen that it wDuld be advantageous to perform clipping 
on the audio wave directly to improve intelligibility. But we have noted 
that a great number of harmonic and intermodulation distortion components 
are formed by this clipping process. To reduce this distortion we can 
translate the audio wave to a radio frequency, say as an upper sideband 
signal, clip it then filter it to regain the original upper sideband band- 
width. The distortion components introduced by the clipping are now 
separated by frequencies of the order of magnitude of the carrier, with 
the exception of the lowest order terms. Passing the clipped r-f wave 
through a filter such as an upper sideband mechanical filter with a pass 
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band of the order of magnitude of the audio range, say 3 KHz., will elim- 
inate all but the desired audio and these lowest order distortion terms. 

It remains to be seen whether the amount of repeaking involved in the 
filtering and frequency translation of the clipped wave back down to audio 
cancels out the gain in intelligibility due to the reduction in distortion. 

To determine the validity of the above statements, a device which 
will be referred to as an i”R-F Speech Processer” was constructed. A block 
diagram of this device is shown in figure 17, and detailed diagrams of 
each component are contained in Appendix IV. The audio input signal is 
translated to a double sideband signal by the balanced modulator, using 
the 455 KHz. L-C oscillator to provide the carrier. The lower sideband 
is removed by the first upper sideband filter. The signal is then am- 
plified and clipped by the r-f amplifier-clipper. This signal is filtered 
by the second upper sideband filter and returned to audio by the product 
detector, again using the 455 KHz. L-C oscillator to insert the carrier. 




Figure 17. R-F Speech Processer 

To compare the intermodulation distortion generated by the speech 
processer, and to measure the repeaking involved in the filtering and 
frequency translation of the clipped wave, two-tone tests were used. 
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The same two tones used in section 8, 1500 and 2500 Hz. were used here. 



Table III shows the results of these tests, in the r-f column, while the 
results of the audio clipping with the gradual clipper from section 8 are 
shown for comparison in the a-f colimin. In Table III we see quite dis- 
tinctly that the R-F Speech Processer causes considerably less intermod- 
ulation distortion than the audio clipping. The results of the repeaking 
measurements are shown in Table IV. Here we see that no serious repeaking 
occurs in the filtering and tranlation of the clipped r-f wave to audio 
frequencies. (The repeaking of a 20db clipped audio wave filtered from 
300-3000 Hz. is 4.2db (28)). 

In order to determine the effect of this processing on intelligibil- 
ity, it was decided to conduct articulation tests with speech processed 
in this manner. Using the notation introduced previously, 10 tests were 
conducted, with r-f clipping levels of 12 and 24 db and ^ ’s of 3, 6, 12, 



cause of the small number of tests involved the pre-recorded method of 
testing was not used. Instead, the word lists described in section 6 
were played through the speech processer directly into the listener's 
headsets for each condition described above. Further details on the 
equipment setup used in these tests are given in Appendix II. 

The results of these tests are shown in figure 18. Further details 
on test results may also be found in Appendix II. In figure 18 we have 
taken the average of the three audio clipping test scores obtained in 
section 8 for each condition shown and plotted them on the same axes as 
the r-f clipping articulation scores. It can be seen that in each case 
the r-f clipped speech is more intelligible than the speech clipped and 
filtered at audio. 



and 18 db and the maximum 




with each clipping level. Be- 
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(c)X-l^(ib (o)X = I^Jb 



Figure 18. ARTICULATION TEST RESULTS 

0= Average scores, audio clipping. 
R-F clipping. 
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Clipping Level 


3.8 


db 


6.8 


db 


17.7 


db 


Frequency 


a-f 


r-f 


a-f 


r-f 


a-f 


r-f 


500 


21.8 


60.0 


17.2 


47.1 


13.1 


40.5 


3500 


22.0 


37.5 


17.0 


28.5 


13.4 


2115 


4500 


38.0 


61.0 


28.0 


57.2 


20.5 


50.5 


5500 


22.0 


62.0 


18.0 


59.0 


13.2 


52.5 


6500 


22.0 


55.0 


17.0 


50.4 


13.6 


47.0 


7500 


36.0 


57.0 


39.5 


52.5 


31.2 


52.8 


8500 


46.0 


- 


40.3 


- 


36.1 


61.1 


9500 


38.8 




28.5 




20.5 





Table III. Intermodulation distortion of two tones of 1500 and 
2500 Hz. by r-f and a-f clipping, in db down from 
fundamentals. 



Clipping Level Re peaking 



db 


db 


3.8 


0.6 


6.8 


1.6 


17.7 


5.1 


26.3 


6.2 



Table IV. Repeaking associated with filtering and translation 
to audio of r~f wave clipped to levels shown. 
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Using the highest set of the three audio clipping scores and apply- 
ing the U-test, again at a level of 0.01, we find that only two sets of 
points (r-f and a-f) do not show a statistically significant difference. 
These are marked A and A* on figure 18c. Thus we can conclude that pro- 
cessing speech with our r-f speech processer is indeed advantageous. The 
average improvement in articulation over the audio processing is 20.57o. 



through the processer to determine the effect of the r-f processer alone 
on intelligibility. At 12 db of clipping the best ^ obtainable was 36.5 



scores obtained under these conditions were 93% for the 12 db case and 
90 % for the 24 db case. When we consider that even under the best condi- 
tions a few words will be missed by the best listener, we can realize 
that these scores really indicate that r-f clipping and filtering alone 
have an almost negligible effect on intelligibility. In fact the loss 
of intelligibility that did occur could be attributed to distortion in- 
troduced in the balanced modulator or product detector and might be 
independent of the actual clipping and filtering process. 

10. Conclusions. 

We have seen that as long as the formant regions are not too sever ly 
distorted or the zero crossings of the time waveform are not radically 
altered, we can do a lot to speech to improve its characteristics vis-a- 
vis our communications systems while not impairing its intelligibility. 

We have noted that this is due to the natural redundancy of speech. 

In our investigation of clipping we have discussed the **noise” intro- 
duced bycthe clipping process itself. We have discussed and investigated 
two ideas for the minimization of this noise. One of these, the idea of 



In addition to the above, tests were run 




available 



db, while at 24 db of clipping 




The articulation 
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gradual versus abrupt clipping, we found to be of no practical value, 
except that we now know that if we are given a choice we might as well 
avoid the need for biasing and use a gradual, unbiased diode clipper 
rather than an abrupt, biased one, since they will result in the same 
level of intelligibility* The other idea, of processing the speech at 
r-f, shows merit* We found that an increase of 20% in intelligibility 
could be achieved over ordinary audio clipping by this method. This con- 
firms the ideas about the ’"noise** introduced by clipping and shows how it 
is reduced substantially by the filtering of the clipped r-f wave. 
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APPENDIX I 



SINGLE AND IWO TONE TESTS 

1. Single tone tests. 

The equipment arrangement used for the single tone tests is shown 
below: 




The Hewlitt Packard 200AB audio oscillator, when set at 200 Hz. provided 
the following output: 

Frequency db 

200 0 

400 -56.0 

600 -67.0 

The clipper chassis for the gradual clipper is shown below: 

/OK 
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For the abrupt clipper the below circuit was used. 

/o/< 




The bias was provided by Hewlitt Packard 721A’s, which were adjusted to 
give a symmetrical one volt clipping level. 

The H.P. wave analyzer has an accuracy of 1% + 5 cps and * 5% in 
voltage. 

2. Two tone tests. 



The equipment arrangement used here was as shown below: 




The output of the two tone generator, 
attached, across a 10k ohm load was: 



' 7 — ^ — - 

Frequency 


db 


Frequency 


1500 


0 


3500 


2500 


0 


4500 


3000 


-68.0 


5000 


clippers used 


in these 


tests were 



single tone tests. 



taken at point A, with no clipper 



db 


Frequency 


db 


-69.0 


6500 


-72.0 


-71.1 


7500 


-60.2 


-72.0 






ident ical 


with those used 


in the 
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APPENDIX II 



ARTICULATION TEST DETAILS 



1. Word lists. 



Below are shown two examples of the phonetically balanced word lists 



used in the articulation tests. 



Word List #17 Word List #32 



1. 


flag 


26. 


read 


1. 


fast 


26. 


rouge 


2. 


thank 


27. 


year 


2. 


soak 


• 

CM 


wise 


3. 


ches s 


28. 


lit 


3. 


clog 


28. 


pad 


4. 


club 


29. 


hoof 


4. 


did 


29. 


judge 


5. 


phone 


30. 


smart 


5. 


roast 


30. 


sigh 


6. 


odd 


31. 


give 


6. 


retch 


31. 


in 


7. 


birth 


32. 


cud 


7. 


beard 


32. 


eye 


8. 


carve 


33. 


mass 


8. 


click 


33. 


pew 


9. 


boost 


34. 


root 


9. 


cart 


34. 


rout (rowt) 


10. 


grace 


35. 


throne 


10. 


joke 


35. 


souse 


11. 


foe 


36. 


ditch 


11. 


gang 


36. 


fair 


12. 


weak 


37. 


wipe 


12. 


tilt 


37. 


wash 


13. 


arch 


38. 


clox-m 


13. 


ace 


38. 


crate 


14. 


gate 


39. 


sip 


14. 


hump 


39. 


seed 


15. 


itch 


40. 


wild 


15. 


mow (mo) 


40. 


walk 


16. 


crowd 


41. 


spud 


16. 


bare 


41. 


skid 


17. 


troop 


42. 


ice 


17. 


duke 


42. 


lid 


18. 


beef 


43. 


key 


18. 


through 


43. 


pack 


19. 


nerve 


44. 


toad 


19. 


puss 


44. 


theme 


20. 


with 


45. 


noose 


20. 


web 


45. 


quip 


21. 


fume 


46. 


rude 


21. 


get 


46. 


salve 


22. 


bit 


47. 


pact 


22. 


brass 


47. 


robe 


23. 


fuse 


48. 


than 


23. 


gob 


48. 


slush 


24. 


ten 


49. 


fluff 


24. 


slice 


49. 


flash 


25. 


nuts 


50. 


chest 


25. 


ramp 


50. 


cork 



2. Peak list wrds. 



Word List 



Peak list word 



Relative amplitude 



1. 


bask 


1.0 


2. 


perk 


1.0 


3. 


fern 


1.1 


4. 


start 


1.05 


5. 


thrash 


1.15 


6. 


check 


1.2 


7. 


rack 


1.0 


8. 


cloak 


1.1 


9. 


good 


1.05 


10. 


thud 


1.1 


11. 


kept 


0.95 


12. 


kept 


1.0 


13. 


scout 


0.90 


14. 


dope 


1.0 



47 



Relative Amplitude 



Word List Peak list word 



15. 


dumb 


0.8 


16. 


look 


1.0 


17. 


ditch 


1.0 


18. 


ditch 


1.0 


19. 


lap 


a;8 


20. 


put 


1.0 


21. 


dull 


0.9 


22. 


crutch 


1.0 


23. 


out 


1.2 


24. 


foot 


1.15 


25. 


soap 


0.95 


26. 


dead 


0.85 


27. 


wreck 


0.85 


28. 


tire 


0.70 


29. 


shock 


0.85 


30. 


thorn 


1.0 


31. 


gyp 


0.80 


32. 


route 


1.0 



3. Listener’s average rank on all tests. 

Section 8 Tests Section 9 Tests 

Listener ABCDEFABCDE 

Ave. Rank 3.9 2.4 3.9 2.5 2.5 3.3 2.1 3.2 2.9 2.5 4.2 

4. Position's average rank on all tests. 

Section 8 Tests Section 9 

Position 1 2 3 4 5 6 Not done 

Ave. Rank 3.0 2;2 3.3 2.9 3.7 3.6 

5. The tests were given in random order. The below list shows the order 
in which the tests were given. 

Section 8 Section 9 



37 


39 


25 


41 


2 


38 


10 


9 


32 


1 


35 


24 


19 


30 


3 


22 


27 


40 


12 


. 4 


18 


4 


29 


31 


9 


17 


7 


14 


42 


10 


n 


6 


15 




5 


2 


36 


3 




7 


16 


34 


11 




6 


13 


5 


33 




8 


21 


20 


28 




11 


12 


23 


26 




12 
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6. The table below shows further details of the articulation testing 
described in section eight. The clipper designation 1N34A/0 means the 



1N34A diode with zero bias. The 
volt reverse bias. 

Test Listener Scores 



tf 


A 


B 


C 


D 


E 


1 


50 


55 


53 


50 


67 


2 


42 


44 


34 


43 


43 


3 


21 


29 


27 


24 


26 


4 


14 


12 


12 


22 


18 


5 


53 


69 


67 


65 


69 


6 


48 


47 


54 


55 


57 


7 


27 


30 


28 


33 


28 


8 


18 


18 


16 


20 


20 


9 


82 


88 


85 


88 


82 


10 


66 


68 


63 


75 


70 


11 


58 


62 


54 


61 


63 


12 


36 


43 


21 


40 


53 


13 


61 


63 


59 


71 


53 


14 


51 


38 


42 


42 


43 


15 


17 


23 


10 


32 


23 


16 


10 


5 


10 


16 


17 


17 


73 


81 


67 


81 


83 


18 


56 


62 


53 


57 


54 


19 


49 


64 


40 


44 


51 


20 


19 


28 


25 


34 


34 


21 


89 


88 


85 


89 


92 


22 


70 


83 


78 


91 


84 


23 


55 


75 


75 


69 


72 


24 


58 


51 


49 


56 


54 


25 


67 


78 


71 


62 


71 


26 


57 


55 


45 


49 


52 


27 


20 


28 


23 


45 


23 


28 


21 


24 


22 


31 


28 


29 


86 


82 


70 


80 


79 


30 


70 


78 


67 


81 


81 


31 


47 


66 


38 


57 


60 


32 


26 


30 


30 


37 


40 


33 


83 


90 


88 


94 


87 


34 


59 


80 


66 


75 


71 


35 


45 


36 


37 


44 


48 


36 


32 


43 


33 


50 


37 


37 


53 


60 


55 


61 


61 


38 


25 


26 


24 


36 


29 


39 


16 


18 


17 


9 


19 


40 


40 


56 


35 


52 


54 


41 


25 


25 


23 


27 


28 


42 


6 


10 


4 


9 


11 



means the 1N34A diode with one 



Clipper 

Used 


c 

db 


A 

db 


Word Lis 
Used 


1N34A/0 


12 


18 


25,26 


ft 


tt 


12 


27,28 


rt 


tt 


6 


1,9 


Tf 


ft 


3 


7,9 


ft 


24 


18 


12,14 


rr 


tt 


12 


3,5 


tt 


tt 


6 


2,7 


tt 


tt 


3 


6,8 


tf 


33 


18 


15,21 


tt 


tt 


12 


14,16 


ft 


tt 


6 


2,6 




tt 


3 


22,26 


1N34A/1 


12 


18 


31,32 


tt 


tt 


12 


30,31 


tt 


tt 


6 


1,10 


tt 


tt 


3 


29,30 


tt 


24 


18 


21,23 


tt 


tt 


12 


17,19 


tt 


tt 


6 


20,22 


tt 


tt 


3 


15,18 


tt 


33 


18 


3,4 


tt 


tt 


12 


13,15 


tt 


tt 


’6 


11,19 


tt 


tt 


3 


18,20 


1N69A/0 


12 


18 


13,17 


tt 


tt 


12 


16,13 


tt 


tt 


6 


22,24 


tt 


tt 


3 


11,18 


tt 


24 


18 


29,27 


tt 


tt 


12 


27,21 


tt 


tt 


6 


24,30 


tt 


tt 


3 


14,17 


tt 


36 


18 


4,8 


tt 


tt 


12 


8,10 


tt 


tt 


6 


29,32 


tt 


tt 


3 


4,6 


NONE 


0 


18 


30,31 


ft 


tt 


12 


18,19 


tt 


tt 


6 


1,2 


tt 


tt 


18 


■4,8 


tt 


tt 


12 


29,30 


tt 


tt 


6 


6,10 



1N34A/1 

F Ave 

57 55 

42 41 

27 26 

14 15 

68 65 

47 51 

34 30 

25 20 

84 85 

63 68 

55 59 

37 39 

67 62 

33 42 

17 20 

15 11 

72 76 

39 54 

49 50 

21 25 

91 89 

84 92 

61 68 

35 51 

79 71 

49 50 

30 28 

22 23 

77 79 

82 77 

57 45 

29 32 

90 89 

71 70 

48 47 

40 39 

63 59 

25 28 

10 15 

56 49 

25 26 

6 9 
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Tests 37, 38, and 39 were composed of speech with no processing at all 
and were used as the dummy training tests. Tests 40, 41, and 42 consisted 
of undipped but filtered speech. 

7. Equipment set up for recording tests of section eight. 



TAPE 








E/LTER 




TAF^a 


RECORDER 




CUPPER 


. ... 


300- 


/ ! 


1 U 


FBCORDER 


*1 








3000Hi 









AJ0/6E 

5£/^£MT- 

OR 



F/L TER 




300- 




3000 Ri 





8, The form shown on the next page was used for all listening tests. 

9. Equipment arrangement for tests of section 9. 



TAPE 




R-F 


record- 


— > 


SPEECH 


ER 




PROCESSEH 



WPUFm 

mi 


/A 

















lORP 



N0tS£ 

&Eh/. 




Amplifier #1 was the amplifier section of an ME-6D/U multimeter, with a 
flat response from 15 to 250,000 Hz. Amplifier #2 was a Hewlett-Packard 
450A amplifier with a flat response from 5 Hz. to 1 MHz. 
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NAME 

1 

2 

3 

4 _ 

5 _ 

6 _ 

7 _ 

8 _ 

9 _ 

10 

11 _ 

12 

13 _ 

14 _ 

15 _ 

16 _ 

17 _ 

18 _ 

19 

20 _ 

21 

22 ^ 

23 _ 

24 _ 

25 



TEST 



POSITION # 



26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 



51 



NAME 

51 

52 

53 

54 

55 

56 

57 _ 

58 _ 

59 

60 

61 

62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 



TEST tt 



76 

77 

78 

79 

80 
81 
82 

83 

84 

85 

86 

87 

88 

89 

90 

91 

92 

93 

94 

95 

96 

97 

98 

99 
100 
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The 3 kilohm load consisted of six 300 ohm headsets each connected across 



a 500 ohm L-pad. The L-pads were connected in series, thus enabling each 
listener to adjust volume and still present a constant 3000 ohm load to 
the circuit. 

10. The table below shows further details of the articulation testing 
described in section nine. 



Test 




Listener Scores 






C 




Word Lj_sts 


# 


A 


B 


C 


D 


E 


Ave. 


db. 


db. 


Used 


1* 


67 


63 


70 


60 


59 


64 


0 


24 


1,2 


2* 


42 


35 


40 


36 


37 


38 


0 


18 


7,9 


3 


97 


95 


91 


95 


89 


93 


12 


36 


20,21 


4 


86 


88 


83 


85 


72 


85 


12 


18 


24,25 


5 


69 


61 


67 


65 


53 


63 


12 


12 


30,32 


6 


51 


54 


49 


52 


47 


51 


12 


6 


12,13 


'^7 


46 


38 


35 


42 


37 


40 


12 


3 


11,14 


8 


94 


91 


90 


92 


85 


90 


24 


30 


4,7 


9 


87 


87 


89 


95 


81 


88 


24 


18 


8,9 


10 


80 


80 


79 


66 


75 


76 


24 


12 


11,13 


11 


65 


55 


76 


67 


58 


64 


24 


6 


12,14 


12 


48 


54 


59 


51 


42 


51 


24 


3 


22,24 



^These tests were used as training tests. 
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APPENDIX III 



THE MANN-WHITNEY U TEST (23, 14) 

1. Description. 

The Mann-Whitney U test is used to determine whether two independent 
sets of samples have been drawn from the same population or not. The null 
hypothesis, is that the two sets have the same distribution. The 

alternative hypothesis, H^, is that one set is stochastically larger or 
smaller than the other. We accept if the probability that one single 
score from one set is larger or smaller than the other is not 1/2. 

2. Method. 

Call one set of scores X with scores X]^, x^, and the other 

set Y with scores Yi, y 29 ••••> y^* First, the two sets of scores are 
combined and the order statistic formed. Then one set, X or Y, is chosen 
to form the parameter U. The value of U is given by the number of times 
that a score in the set, say X, follows a score from Y. A table is con- 
sulted giving for each set of m and n the probability that U^U^, the 
value found, if is true. A significance level, is chosen. If 

the value found from the U-test table is greater than 0< then we say that 
the sets X and Y came from the same population, or that is true. Con- 
versly, if this value is less than the chosen, then we say that is 
true or that X and Y are from different populations. If the 11 = 11^ cal- 
culated is greater than mn , then we use = mn - as the value of U 

2 

for the table. 

3. Examples. 

Choose c>^ = 0.01. This means that it is desired that the values of 
U should be so small that the probability of their occurrence under is 
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less than or equal to 0.01. 

Tests 12, 24 of section eight ; 

X = test 12 scores = 36,43,21,40,53,37. 

m = n = 6 

Y = test 24 scores - 58,51,49,56,54,35. 

Order statistic: 21, 35, 36, 37, 40, 43, 49, 51, 53, 54, 56, 58. 

^ti^ach belongs to: XYXXXXYYXYYY 

To find U use set X: U = 0+ l + l + l + l + 3 = 7 

o o 

Table J on page 271 of Siegel gives P(u£7/Hq) = 0.092. 

This is greater than ^ so is rejected. 

Tests 7, 19 of section eight : 

X = test 7 = 27,30,28,33,28,34. 

m n n = 6. 

Y = test 19 = 49,64,40,44,51,49. 

Order statistic: 27, 28, 28, 30, 33, 34, 40, 44, 49, 49, 51, 64. 

Set each belongs to: XXXXXXYYYYYY 

U ^=0 + 0 + 0 + 0 + 0 + 0 = 0 

From the table P(U~0/H^) = 0.002, which is less than c>< , so Hq is re- 
jected. 

4. Significance level. 

A significance level of 0.01 was chosen since it was felt that one 
could not be too rigorous considering the relatively unsophisticated method 
of testing and the size of the samplers. In the study conducted at Montana 
State College (27) a significance level of 0.001 was used. Lindgren sug- 
gests levels of from 0.05 to 0.1, while giegel uses levels from 0.001 to 
0.14. Siegel, in discussing significance gives 0.01 and 0.05 as common 
values for this type of data. 

5. Efficiency. The efficiency of this test is quoted by both Lindgren arid 
Siegel to be 0.96 asymptotically, and Lindgren quotes Hodges and Lehmann 
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as showing that it is always at least 0.864, thus making it one of the 
most powerful of such tests. 
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APPENDIX IV 



DETAILED SCHEMATIC DIAGRAMS OF R-F SPEECH PROCESSER 
1. Detailed schematic of 455 KHz. L-C oscillator; 
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2. Detailed schematic of balanced modulator: 



AK 
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3. Detailed schematic of 455 KHz. amplifier and clipper: 
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